A Study of Many-Core Hardware Accelerated Hadoop MapReduce
نویسندگان
چکیده
MapReduce is a widely used framework for massive data processing. It was originally designed to overcome the I/O bottleneck, and enabled us to process Bigdata with the commodity clusters systems. However, several existing work have recently shown that the emerging high speed storage and network devices are capable to remove the I/O bottleneck and made the CPU the next serious bottleneck in the MapReduce framework. In this paper, we propose hardware accelerated (HA) Hadoop MapReduce framework and implement it on a Tilera's many-core processor board to overcome the CPU bottleneck. Our proposed solution offloads the main parts of Map and Reduce procedures, including the data parsing, sorting and merging. Based on our experimental evaluations, we verify the feasibility of our proposal. Keyword Hardware Acceleration,MapReduce,Many-core,Hadoop
منابع مشابه
A Research of MapReduce with GPU Acceleration
MapReduce is an efficient distributed computing model on large data sets. The data processing is fully distributed on huge amount of nodes, and a MapReduce cluster is of highly scalable. However, single-node performance is gradually to be a bottleneck in computeintensive jobs, which makes it difficult to extend the MapReduce model to wider application fields such as largescale image processing ...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملThe Anatomy of MapReduce Jobs, Scheduling, and Performance Challenges
Hadoop is a leading open source tool that supports the realization of the Big Data revolution and is based on Google’s MapReduce pioneering work in the field of ultra large amount of data storage and processing. Instead of relying on expensive proprietary hardware, Hadoop clusters typically consist of hundreds or thousands of multi-core commodity machines. Instead of moving data to the processi...
متن کاملProfiling and evaluating hardware choices for MapReduce environments: An application-aware approach
The core business of many companies depends on the timely analysis of large quantities of new data. MapReduce clusters that routinely process petabytes of data represent a new entity in the evolving landscape of clouds and data centers. During the lifetime of a data center, old hardware needs to be eventually replaced by new hardware. The hardware selection process needs to be driven by perform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014